54 research outputs found

    Vesyla-II: An Algorithm Library Development Tool for Synchoros VLSI Design Style

    Full text link
    High-level synthesis (HLS) has been researched for decades and is still limited to fast FPGA prototyping and algorithmic RTL generation. A feasible end-to-end system-level synthesis solution has never been rigorously proven. Modularity and composability are the keys to enabling such a system-level synthesis framework that bridges the huge gap between system-level specification and physical level design. It implies that 1) modules in each abstraction level should be physically composable without any irregular glue logic involved and 2) the cost of each module in each abstraction level is accurately predictable. The ultimate reasons that limit how far the conventional HLS can go are precisely that it cannot generate modular designs that are physically composable and cannot accurately predict the cost of its design. In this paper, we propose Vesyla, not as yet another HLS tool, but as a synthesis tool that positions itself in a promising end-to-end synthesis framework and preserving its ability to generate physically composable modular design and to accurately predict its cost metrics. We present in the paper how Vesyla is constructed focusing on the novel platform it targets and the internal data structures that highlights the uniqueness of Vesyla. We also show how Vesyla will be positioned in the end-to-end synchoros synthesis framework called SiLago

    NACU: A Non-Linear Arithmetic Unit for Neural Networks

    Get PDF
    Reconfigurable architectures targeting neural networks are an attractive option. They allow multiple neural networks of different types to be hosted on the same hardware, in parallel or sequence. Reconfigurability also grants the ability to morph into different micro-architectures to meet varying power-performance constraints. In this context, the need for a reconfigurable non-linear computational unit has not been widely researched. In this work, we present a formal and comprehensive method to select the optimal fixed-point representation to achieve the highest accuracy against the floating-point implementation benchmark. We also present a novel design of an optimised reconfigurable arithmetic unit for calculating non-linear functions. The unit can be dynamically configured to calculate the sigmoid, hyperbolic tangent, and exponential function using the same underlying hardware. We compare our work with the state-of-the-art and show that our unit can calculate all three functions without loss of accuracy

    Methodology for Structured Data-Path Implementation in VLSI Physical Design: A Case Study

    Get PDF
    State-of-the-art modern microprocessor and domain-specific accelerator designs are dominated by data-paths composed of regular structures, also known as bit-slices. Random logic placement and routing techniques may not result in an optimal layout for these data-path-dominated designs. As a result, implementation tools such as Cadence’s Innovus include a Structured Data-Path (SDP) feature that allows data-path placement to be completely customized by constraining the placement engine. A relative placement file is used to provide these constraints to the tool. However, the tool neither extracts nor automatically places the regular data-path structures. In other words, the relative placement file is not automatically generated. In this paper, we propose a semi-automated method for extracting bit-slices from the Innovus SDP flow. It has been demonstrated that the proposed method results in 17% less density or use for a pixel buffer design. At the same time, the other performance metrics are unchanged when compared to the traditional place and route flow.publishedVersio

    Refresh Triggered Computation: Improving the Energy Efficiency of Convolutional Neural Network Accelerators

    Full text link
    To employ a Convolutional Neural Network (CNN) in an energy-constrained embedded system, it is critical for the CNN implementation to be highly energy efficient. Many recent studies propose CNN accelerator architectures with custom computation units that try to improve energy-efficiency and performance of CNNs by minimizing data transfers from DRAM-based main memory. However, in these architectures, DRAM is still responsible for half of the overall energy consumption of the system, on average. A key factor of the high energy consumption of DRAM is the refresh overhead, which is estimated to consume 40% of the total DRAM energy. In this paper, we propose a new mechanism, Refresh Triggered Computation (RTC), that exploits the memory access patterns of CNN applications to reduce the number of refresh operations. We propose three RTC designs (min-RTC, mid-RTC, and full-RTC), each of which requires a different level of aggressiveness in terms of customization to the DRAM subsystem. All of our designs have small overhead. Even the most aggressive RTC design (i.e., full-RTC) imposes an area overhead of only 0.18% in a 16 Gb DRAM chip and can have less overhead for denser chips. Our experimental evaluation on six well-known CNNs show that RTC reduces average DRAM energy consumption by 24.4% and 61.3%, for the least aggressive and the most aggressive RTC implementations, respectively. Besides CNNs, we also evaluate our RTC mechanism on three workloads from other domains. We show that RTC saves 31.9% and 16.9% DRAM energy for Face Recognition and Bayesian Confidence Propagation Neural Network (BCPNN), respectively. We believe RTC can be applied to other applications whose memory access patterns remain predictable for a sufficiently long time

    Self-organisation and its application to binding

    No full text
    This paper presents Kohonen’s self-organisationalgorithm as an optimisation tool. Its application isillustrated by applying it to a high-level synthesis(HLS)problem - binding: the task of assigning operations to specijicinstances of functional units. It is a crucial problem,as it injluences the interconnect, wiring and register cost.This Self-Organising Binder (SOB) has a built-in hillclimbingmechanism, and being based on neural-networkit can be easily parallelised. We apply SOB to benchmarkexamples and show that the results are comparable to thebest reported in the field.QC 2012022

    Self-organisation and its application to binding

    No full text
    This paper presents Kohonen’s self-organisationalgorithm as an optimisation tool. Its application isillustrated by applying it to a high-level synthesis(HLS)problem - binding: the task of assigning operations to specijicinstances of functional units. It is a crucial problem,as it injluences the interconnect, wiring and register cost.This Self-Organising Binder (SOB) has a built-in hillclimbingmechanism, and being based on neural-networkit can be easily parallelised. We apply SOB to benchmarkexamples and show that the results are comparable to thebest reported in the field.QC 2012022

    Incidencia del tlcan y de los acuerdos de protección a la inversión extranjera sobre las relaciones de México con la Unión Europea

    Get PDF
    This article analyzes the way nafta has affected Mexico’s transatlantic relations with the European Union after the implementation of the Global Agreement between Mexico and the eu (2000) and the Bilateral Investment Protection Agreements concluded between Mexico and some European countries from 1995. The hypotheses of this work, consider that in the last 13 years, Mexico transatlantic relations have undergone a process of relative embrittlement motivated by at least two facts: first, would be the introduction of what is known as the nafta Parity in the negotiation process. One of the most tangible achievements of the agreement was that thanks to nafta Parity, Mexico used as a platform to redirect part of the investment and European goods, mainly to the usa, but also the rest of North America. A second aspect that strengthens our hypothesis is the impact that produced the Treaty of Lisbon (2009), along with the debate between Community member, linked to the European investment protection outside the eu and the validity and relevance of the settlement dispute mechanisms on investment, that European companies used against a non-eu countries through the protection it confers a bit/appris. This aspect has become a compelling argument to determine the investment criteria of the eu companies in Mexico and Latin America, among other regionsEste artículo analiza cómo el tlcan ha impactado las relaciones transatlánticas de México con la Unión Europea a raíz de la instrumentación del Acuerdo Global (2000) y de los Acuerdos Bilaterales de Protección a la Inversión Extranjera celebrados entre México y algún país comunitario desde 1995. La hipótesis que guía este trabajo considera que en los últimos 13 años las relaciones transatlánticas de México han experimentado una relativa fragilización debido al menos a dos aspectos: uno estaría relacionado con las consecuencias de la adopción del nafta Parity en el Acuerdo Global la cual tuvo como consecuencia la utilización de México como una suerte de plataforma para redireccionar parte de bienes y capital europeos al mercado estadounidense. Un segundo hecho que fortalece nuestro análisis, es el impacto que ha producido el Acuerdo de Lisboa (2009) sobre el debate comunitario respecto a la validez y pertinencia de los mecanismos de solución de disputa en materia de inversión al que comúnmente recurren las empresas comunitarias en contra de gobiernos no-comunitarios, amparados en un bit/appris. Este aspecto se ha convertido en un argumento de peso en las estrategias de inversión de las compañías europeas en México y en América Latina, entre otras regione

    A neural net based Self Organising Scheduling Algorithm

    No full text
    Scheduling is a crucial task in behavioural synthesis and aNp-hard optimisation problem. Neural net computationparadigms bring potential for efficient solutions to suchproblems. This paper presents a new scheduling algorithmbased on Kohonen’s rule for self organisation. The algorithmhas an inherent hill climbing mechanism, copeswith a comprehensive set of constraints and can be implementedon massively parallel structures. Its performanceon well known benchmark examples, presented in the paper,is on par with the best reported.QC 2012021
    • …
    corecore